Novel Selection Methods for Monte-carlo Tree Search
نویسنده
چکیده
Preface In this thesis I present the result of my investigation into regret minimization for Monte-Carlo Tree Search. The thesis presents the motivation, background, and formal definition of a novel search technique based on minimizing both simple and cumulative regret in a game tree: Hybrid MCTS (H-MCTS). The technique minimizes the two types of regret in a single search-tree. This ensures that recommendations made by the algorithm have a low simple regret, and at the same time internal nodes are sampled efficiently. It was developed for, and tested in six two-Special thanks goes to both Dr. Mark Winands and Dr. Marc Lanctot for providing the inspiration and guidance required to develop this novel algorithm. Their combined experience was crucial to obtain the results presented in this work. Thanks goes to Prof. Dr. Tristan Cazenave for his time and assistance with the implementation of SHOT, and for the experiments he performed in his award-winning engine. Thanks also to Dr. Steve Kroon for his insightful input and assistance in proofreading the work. Moreover , I would like to thank my wife Priscilla for her support, and for her patience and understanding. Without both her emotional and financial assistance you would not be reading this thesis. Summary Monte-Carlo Tree Search (MCTS) is a best-first search technique, which bases decisions on sampling the state-space of a domain. In different domains, MCTS has proven to be an effective approach when complex decision-making based on future rewards and outcomes is required. The technique was initially inspired by algorithms used to solve multi-armed bandit (MAB) problems. Such a problem can be described as a single-ply MCTS search, in which an agent is given a choice of options (arms), each with their own probability distribution. Sampling an arm returns a random result from its underlying distribution, and the goal of the agent is to maximize its reward and/or provide a recommendation of which arm has the most rewarding distribution. Based on the context of the MAB problem, the agent's goal is to either minimize simple regret, i.e., the regret of not recommending the best action, or cumulative regret, i.e., regret accumulated over time. Applying this theory to MCTS however, may require more consideration. In a recursive MAB (such as MCTS), where the distribution of each arm is based on an underlying growing search-tree, minimizing a single type of regret throughout the tree implies that at each ply of …
منابع مشابه
Cooperative Games with Monte Carlo Tree Search
Monte Carlo Tree Search approach with Pareto optimality and pocket algorithm is used to solve and optimize the multi-objective constraint-based staff scheduling problem. The proposed approach has a two-stage selection strategy and the experimental results show that the approach is able to produce solutions for cooperative games.
متن کاملMonte-Carlo Tree Search: Applied to Domineering and Tantrix
................................................................................................................................................... i Chapter 1: Introduction ........................................................................................................................... 1 The Rules of Tantrix ...............................................................................
متن کاملMonte-Carlo Exploration for Deterministic Planning
Search methods based on Monte-Carlo simulation have recently led to breakthrough performance improvements in difficult game-playing domains such as Go and General Game Playing. Monte-Carlo Random Walk (MRW) planning applies MonteCarlo ideas to deterministic classical planning. In the forward chaining planner ARVAND, MonteCarlo random walks are used to explore the local neighborhood of a search ...
متن کاملThompson Sampling Based Monte-Carlo Planning in POMDPs
Monte-Carlo tree search (MCTS) has been drawing great interest in recent years for planning under uncertainty. One of the key challenges is the tradeoff between exploration and exploitation. To address this, we introduce a novel online planning algorithm for large POMDPs using Thompson sampling based MCTS that balances between cumulative and simple regrets. The proposed algorithm — Dirichlet-Di...
متن کاملStochastic Planning in Large Search Spaces
Multi-agent planning approaches are employed for many problems including task allocation, surveillance and video games. In the first part of my thesis, we study two multi-robot planning problems, i.e. patrolling and task allocation. For the patrolling problem, we present a novel stochastic search technique, Monte Carlo Tree Search with Useful Cycles, that can generate optimal cyclic patrol poli...
متن کاملMetareasoning for Monte Carlo Tree Search
Sequential decision problems are often approximately solvable by simulating possible future action sequences; such methods are a staple of gameplaying algorithms, robot path planners, model-predictive control systems, and logistical planners in operations research. Since the 1960s, researchers have sought effective metareasoning methods for selecting which action sequences to simulate, basing t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014